Speech driven 3-d face point trajectory synthesis algorithm
نویسندگان
چکیده
This paper presents a novel algorithm which generates three-dimensional face point trajectories for a given speech le with or without its text. The proposed algorithm rst employs an o -line training phase. In this phase, recorded face point trajectories along with their speech data and phonetic labels are used to generate phonetic codebooks. These codebooks consist of both acoustic and visual features. Acoustics are represented by Line spectral frequencies (LSF), and face points are represented with their principal components (PC). During the synthesis stage, speech input is rated in terms of its similarity to the codebook entries. Based on the similarity, each codebook entry is assigned a weighting coe cient. If the phonetic information about the test speech is available, this is utilized in restricting the codebook search to only several codebook entries which are visually closest to the current phoneme (a visual phoneme similarity matrix is generated for this purpose). Then these weights are used to synthesize the principal components of the face point trajectory. The performance of the algorithm is tested on held-out data, and the synthesized face point trajectories showed a correlation of 0.73 with true face point trajectories.
منابع مشابه
3-D Face Point Trajectory Synthesis Using An Automatically Derived Visual Phoneme Similarity Matrix
This paper presents a novel algorithm which generates three-dimensional face point trajectories for a given speech le with or without its text. The proposed algorithm rst employs an o -line training phase. In this phase, recorded face point trajectories along with their speech data and phonetic labels are used to generate phonetic codebooks. These codebooks consist of both acoustic and visual f...
متن کاملCodebook Based Face Point Trajectory Synthesis Algo - rithm Using Speech
This paper presents a novel algorithm which generates three-dimensional face point trajectories for a given speech le with or without its text. The proposed algorithm rst employs an oo-line training phase. In this phase, recorded face point trajectories along with their speech data and phonetic labels are used to generate phonetic codebooks. These codebooks consist of both acoustic and visual f...
متن کاملSpeaker-independent 3D face synthesis driven by speech and text
In this study, a complete system that generates visual speech by synthesizing 3D face points has been implemented. The estimated face points drive MPEG-4 facial animation. This system is speaker independent and can be driven by audio or both audio and text. The synthesis of visual speech was realized by a codebook-based technique, which is trained with audio-visual data from a speaker. An audio...
متن کاملModeling and Synthesis of Facial Motion Driven by Speech
We introduce a novel approach to modeling the dynamics of human facial motion induced by the action of speech for the purpose of synthesis. We represent the trajectories of a number of salient features on the human face as the output of a dynamical system made up of two subsystems, one driven by the deterministic speech input, and a second driven by an unknown stochastic input. Inference of the...
متن کاملRobot Arm Performing Writing through Speech Recognition Using Dynamic Time Warping Algorithm
This paper aims to develop a writing robot by recognizing the speech signal from the user. The robot arm constructed mainly for the disabled people who can’t perform writing on their own. Here, dynamic time warping (DTW) algorithm is used to recognize the speech signal from the user. The action performed by the robot arm in the environment is done by reducing the redundancy which frequently fac...
متن کامل